Ecology In Your Life - BIOL 230 W2021

Written by: Kaede Ito

Student Number: 58847666

Ecology Question

Given ocean temperatures off the coast of British Columbia, what could be the effect of the 2021 summer Western North America heat wave and how could it have changed the community of algae species (diversity, abundance, new species establishment)?

Data Sources

For this project, I need data of the ocean surface / close-to-surface temperatures during the summer heatwave period (mid June to late July) (source_link) and perferably also it’s coordinates. This would mean I’d need:

  • ocean temperatures (close to the surface)

  • time and date in the right range

  • coordinates

I’d also need some sort of anecdotal / observational / measured information of the algae during that time, or based on the known list of algae that resides off the coast of British Columbia, their tolerable temperature ranges.

Ocean Temperatures

Underway meteorological, navigational, optical, physical, and time series data collected aboard NOAA Ship Ronald H. Brown in the Coastal Waters of Southeast Alaska and British Columbia, Columbia River estuary - Washington/Oregon and others from 2021-06-13 to 2021-07-26 (NCEI Accession 0240415)

Analysis

Analysis and visualization was done using R and various packages. The following is the script used to generate 2 scatterplot graphs.

Setup

library(tidyverse)
library(lubridate) 
library(ggplot2)
library(plotly)
options(repr.plot.width=10, repr.plot.height=6)

Reading and Wrangling Data

Temperature data is found in the "data" folder, while coordinates (and the time recorded) are in the "nav" folder inside the "data" folder.

Based on the metadata that was provided alongside the raw data (as broken down above), we need the external temperatures recorded in the SBE45-TSG-MSG_20210XXX-XXXXXX.Raw files and SST-TSG-Temp-Diff-MSG_20210XXX-XXXXXX.Raw files.

We also need the GPS data recorded in the Primary-GPS-GGA_XXX-XXXXXX.Raw files.

However, all of the data needs some preprocessing/cleaning up before it can be used in making the graphs.

Some major data cleaning has been done using a Python script, located in the data_cleaning folder. No packages were used, and can be used as long as a v3.9 Python is installed (anything above or below is untested) and the scripts are pointed to the right data sources.

clean_SBE45_data <- function(x) {
  read <- read_delim(x, delim = ",", 
                     col_names = c("date", 
                                    "time", 
                                    "int_temp", 
                                    "conductivity",
                                    "salinity",
                                    "sound_vel",
                                    "ext_temp")) %>%
    select(date, time, ext_temp)
  return(read)
}
clean_STT_TSG_data <- function(x) {
  read <- read_delim(x, delim = ",",
                     col_names = c("date",
                                   "time",
                                   "type",
                                   "diff",
                                   "ext_temp",
                                   "int_temp")) %>%
    select(date, time, ext_temp)    
  return(read)
}
clean_temp_data <- function(x) {
  # https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string
  if(grepl("SBE45-TSG-MSG", x, fixed = TRUE)) {
    return(clean_SBE45_data(x))
  } else {
    return(clean_STT_TSG_data(x))
  }
}
clean_nav_data <- function(x) {
  read <- read_csv(x, 
                   col_names = c(
                     "date",
                     "time",
                     "type",
                     "time_num",
                     "lat",
                     "lat_NS",
                     "long",
                     "long_WE",
                     "gps_quality",
                     "num_sat_view",
                     "hort_dil",
                     "ant_alt",
                     "ant_alt_unit",
                     "geoidal",
                     "geoidal_unit",
                     "age_diff",
                     "diff_station",
                     "checksum"
                   )) %>%
    select(date, time, lat, lat_NS, long, long_WE) %>%
    mutate(long_WE = ifelse((long_WE == "" | long_WE == NA), "W", long_WE)) %>%
    mutate(lat_NS = ifelse((lat_NS == "" | lat_NS == NA), "N", lat_NS)) %>%
    mutate(lat_NS = as.factor(lat_NS), long_WE = as.factor(long_WE))
    return(read)
}

The dates are all of type character, meaning extracting any use without it being a proper date type is hard. Therefore, time and date must be formatted.

format_datetime <- function(df) {
  df_new <- df %>%
    # https://www.neonscience.org/resources/learning-hub/tutorials/dc-time-series-subset-dplyr-r
    mutate(date = as.Date(date, format = '%m/%d/%Y')) %>%
    # https://www.tidyverse.org/blog/2021/03/clock-0-1-0/
    mutate(datetime = as.POSIXct(date, "America/Vancouver")) %>%
    mutate(datetime = datetime +hour(time)+ minute(time))
  
  return(df_new)
}

We have all of the functions needed to clean up 1 file. However, we have quite a few files, and trying to clean and instantiate each by hand is cumbersome. Therefore, we will iterate through all of the files and summarize. The data is summarized as follows:

  • mean temperature/min of that day

  • mean latitude/min of that day

  • mean longitude/min of that day

Attention

Please keep in mind that the following code blocks will take pretty long to run.

# https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once
all_temperature_loaded <- list.files(path = "data/",
             pattern = "*.Raw",
             full.names = T) %>%
  map_df(~clean_temp_data(.))
head(all_temperature_loaded)
summary(all_temperature_loaded)
< table of extent 0 x 0 >
all_temperature <- all_temperature_loaded %>%
  filter(!is.na(ext_temp)) %>%
  filter(ext_temp > 2) %>%
  format_datetime() %>%
  group_by(datetime) %>%
  summarize(mean_ext = mean(ext_temp, na.rm = TRUE))
Error in as.Date.default(date, format = "%m/%d/%Y"): do not know how to convert 'date' to class "Date"
Traceback:

1. all_temperature_loaded %>% filter(!is.na(ext_temp)) %>% filter(ext_temp > 
 .     2) %>% format_datetime() %>% group_by(datetime) %>% summarize(mean_ext = mean(ext_temp, 
 .     na.rm = TRUE))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. function_list[[i]](value)
8. format_datetime(.)
9. df %>% mutate(date = as.Date(date, format = "%m/%d/%Y")) %>% 
 .     mutate(datetime = as.POSIXct(date, "America/Vancouver")) %>% 
 .     mutate(datetime = datetime + hour(time) + minute(time))   # at line 2-7 of file <text>
10. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
11. eval(quote(`_fseq`(`_lhs`)), env, env)
12. eval(quote(`_fseq`(`_lhs`)), env, env)
13. `_fseq`(`_lhs`)
14. freduce(value, `_function_list`)
15. function_list[[i]](value)
16. mutate(., date = as.Date(date, format = "%m/%d/%Y"))
17. mutate.tbl_df(., date = as.Date(date, format = "%m/%d/%Y"))
18. mutate_impl(.data, dots, caller_env())
19. as.Date(date, format = "%m/%d/%Y")
20. as.Date.default(date, format = "%m/%d/%Y")
21. stop(gettextf("do not know how to convert '%s' to class %s", 
  .     deparse(substitute(x)), dQuote("Date")), domain = NA)
head(all_temperature)
summary(all_temperature)
datetimemean_ext
2021-06-12 17:00:2018.94975
2021-06-12 17:00:2118.81057
2021-06-12 17:00:2219.17494
2021-06-12 17:00:2319.26122
2021-06-12 17:00:2419.28124
2021-06-12 17:00:2519.20406
    datetime                      mean_ext    
 Min.   :2021-06-12 17:00:20   Min.   :10.17  
 1st Qu.:2021-06-23 17:00:13   1st Qu.:13.35  
 Median :2021-07-04 17:00:06   Median :14.92  
 Mean   :2021-07-04 07:00:13   Mean   :15.28  
 3rd Qu.:2021-07-15 11:00:20   3rd Qu.:16.71  
 Max.   :2021-07-25 17:01:16   Max.   :21.71  
all_nav_loaded <- list.files(path = "data/nav/",
                      pattern = "*.Raw",
                      full.names = T) %>%
  map_df(~clean_nav_data(.)) 
all_nav <- all_nav_loaded %>%
  format_datetime() %>%
  group_by(datetime, long_WE, lat_NS) %>%
  summarize(mean_lat = mean(lat), mean_long = mean(long)) %>%
  mutate(mean_lat = mean_lat/100, mean_long= mean_long/100)
head(all_nav)
summary(all_nav)
datetimelong_WElat_NSmean_latmean_long
2021-06-12 17:00:16NA NA 32.41784 117.0942
2021-06-12 17:00:17NA NA 32.41788 117.0941
2021-06-12 17:00:18NA NA 32.41809 117.0938
2021-06-12 17:00:19NA NA 32.41575 117.1053
2021-06-12 17:00:20NA NA 32.40404 117.1328
2021-06-12 17:00:21NA NA 32.39704 117.1738
    datetime                     long_WE             lat_NS         
 Min.   :2021-06-12 17:00:16   Length:3749        Length:3749       
 1st Qu.:2021-06-23 17:00:14   Class :character   Class :character  
 Median :2021-07-04 17:00:26   Mode  :character   Mode  :character  
 Mean   :2021-07-04 11:44:29                                        
 3rd Qu.:2021-07-15 17:00:34                                        
 Max.   :2021-07-25 17:01:16                                        
 NA's   :1                                                          
    mean_lat       mean_long    
 Min.   :31.32   Min.   :117.1  
 1st Qu.:33.60   1st Qu.:120.5  
 Median :37.50   Median :123.2  
 Mean   :39.37   Mean   :122.9  
 3rd Qu.:45.08   3rd Qu.:124.8  
 Max.   :52.21   Max.   :130.5  
 NA's   :457     NA's   :527    

Since we have the date and time (by the minute) of both the temperature and it’s coordinates, we can match the two variables together.

joined_temp_nav <- inner_join(all_temperature, 
                             all_nav,
                             by = c("datetime" = "datetime"))
head(joined_temp_nav)
summary(joined_temp_nav)
datetimemean_extlong_WElat_NSmean_latmean_long
2021-06-12 17:00:2018.94975 NA NA 32.40404 117.1328
2021-06-12 17:00:2118.81057 NA NA 32.39704 117.1738
2021-06-12 17:00:2219.17494 NA NA 32.39495 117.1988
2021-06-12 17:00:2319.26122 NA NA 32.39151 117.2244
2021-06-12 17:00:2419.28124 NA NA 32.39131 117.2250
2021-06-12 17:00:2519.20406 NA NA 32.39111 117.2255
    datetime                      mean_ext       long_WE         
 Min.   :2021-06-12 17:00:20   Min.   :10.17   Length:3744       
 1st Qu.:2021-06-23 17:00:17   1st Qu.:13.37   Class :character  
 Median :2021-07-04 17:00:28   Median :14.98   Mode  :character  
 Mean   :2021-07-04 12:17:59   Mean   :15.31                     
 3rd Qu.:2021-07-15 17:00:35   3rd Qu.:16.77                     
 Max.   :2021-07-25 17:01:16   Max.   :21.71                     
                                                                 
    lat_NS             mean_lat       mean_long    
 Length:3744        Min.   :31.32   Min.   :117.1  
 Class :character   1st Qu.:33.63   1st Qu.:120.5  
 Mode  :character   Median :37.50   Median :123.2  
                    Mean   :39.38   Mean   :122.9  
                    3rd Qu.:45.08   3rd Qu.:124.8  
                    Max.   :52.21   Max.   :130.5  
                    NA's   :456     NA's   :526    

Visualize the Data

time_plot <- ggplot(all_temperature, aes(x = datetime, 
                                         y = mean_ext, 
                                         colour = mean_ext)) +
  geom_point() +
  scale_colour_gradient(low = "blue", high = "red") +
  labs(x = "Date and Time PST", 
       y = "Mean (by the min) Ocean Temperature (celcius)",
       colour = "Mean External Temperature")
time_plot
../../_images/Ecology_In_Your_Life_32_0.png
p<- plot_ly(joined_temp_nav, 
        x = ~mean_lat, 
        y = ~mean_long,
        z = ~mean_ext,
        color = ~mean_ext) %>%
  add_markers(size = 0.7)
embed_notebook(p)